Call Stata using IPython magic commands

In an IPython environment, you can use the stata and mata magic commands to interact with Stata and Mata. Below, we include some examples to get you started using these magics. For more detailed information on the magic commands, see Magic commands.

The stata magic

The stata magic is used to execute Stata commands. It can be used as both a line magic, %stata, and a cell magic, %%stata. In other words, with %stata, you can run a single-line command (just as you would in the Stata Command window), and with %%stata, you can run multiple Stata commands from a cell at once (as you would in a do-file).

First, we load the auto dataset and describe its contents.

[2]:
%%stata
sysuse auto, clear
describe

. sysuse auto, clear
(1978 automobile data)

. describe

Contains data from C:\Program Files\Stata18/ado\base/a/auto.dta
 Observations:            74                  1978 automobile data
    Variables:            12                  13 Apr 2022 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear ratio
foreign         byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------
Sorted by: foreign

.

Next, we fit a linear regression.

[3]:
%%stata
reg mpg price i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     23.01
       Model |  960.866305         2  480.433152   Prob > F        =    0.0000
    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932
-------------+----------------------------------   Adj R-squared   =    0.3761
       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597
             |
     foreign |
    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407
       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605
------------------------------------------------------------------------------

The %%stata magic command offers several arguments that can be used to control the execution of Stata’s commands and to pass data between Stata and Python. For example, you can use the -eret, -ret, or -sret argument to push Stata’s e(), r(), or s() results to Python.

[4]:
%%stata -eret myeret
reg mpg price i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     23.01
       Model |  960.866305         2  480.433152   Prob > F        =    0.0000
    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932
-------------+----------------------------------   Adj R-squared   =    0.3761
       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597
             |
     foreign |
    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407
       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605
------------------------------------------------------------------------------

The code above passes the e() results returned by Stata’s regress command to Python as a dictionary named myeret. The keys of the dictionary are Stata’s macro and scalar names, and the values are their corresponding values. Here are its contents.

[5]:
myeret
[5]:
{'e(N)': 74.0,
 'e(df_m)': 2.0,
 'e(df_r)': 71.0,
 'e(F)': 23.007494485746342,
 'e(r2)': 0.39324012569622946,
 'e(rmse)': 4.569638248831391,
 'e(mss)': 960.8663049714787,
 'e(rss)': 1482.5931544879809,
 'e(r2_a)': 0.3761482982510528,
 'e(ll)': -215.90831771275379,
 'e(ll_0)': -234.39433764823468,
 'e(rank)': 3.0,
 'e(cmdline)': 'regress mpg price i.foreign',
 'e(title)': 'Linear regression',
 'e(marginsprop)': 'minus',
 'e(marginsok)': 'XB default',
 'e(vce)': 'ols',
 'e(_r_z_abs__CL)': '|t|',
 'e(_r_z__CL)': 't',
 'e(depvar)': 'mpg',
 'e(cmd)': 'regress',
 'e(properties)': 'b V',
 'e(predict)': 'regres_p',
 'e(model)': 'ols',
 'e(estat_cmd)': 'regress_estat',
 'e(b)': array([[-9.59034169e-04,  0.00000000e+00,  5.24527100e+00,
          2.56505843e+01]]),
 'e(V)': array([[ 3.29592449e-08,  0.00000000e+00, -1.02918123e-05,
         -2.00142479e-04],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         -0.00000000e+00],
        [-1.02918123e-05,  0.00000000e+00,  1.35394617e+00,
         -3.39072871e-01],
        [-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
          1.61691892e+00]]),
 'e(beta)': array([[-0.4889233,  0.       ,  0.4172175]])}

Stata matrices (like e(b) and e(V)) are converted to NumPy arrays.

[6]:
e_v = myeret['e(V)']
e_v
[6]:
array([[ 3.29592449e-08,  0.00000000e+00, -1.02918123e-05,
        -2.00142479e-04],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        -0.00000000e+00],
       [-1.02918123e-05,  0.00000000e+00,  1.35394617e+00,
        -3.39072871e-01],
       [-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
         1.61691892e+00]])

Stata’s graphs can also be displayed in the IPython environment. Here we create a scatterplot of car mileage against price by using the %stata line magic.

[7]:
%stata scatter mpg price
../_images/notebook_Quick_Start1_11_0.svg

You can load a NumPy array or pandas DataFrame into Stata, making it the current working dataset, by specifying the %%stata magic’s -d argument. To demonstrate, we first load a Boston housing price dataset from the scikit-learn package into a pandas DataFrame named boston.

[8]:
from sklearn import datasets
import pandas as pd

bos = datasets.load_boston()
boston = pd.DataFrame(bos.data)
boston.columns = bos.feature_names
boston['MEDV'] = bos.target
boston.head()
[8]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2

Next, we load boston into Stata as the current dataset in memory and describe its contents. Here you can also use the API function pdataframe_to_data() of the stata module to load the pandas DataFrame into Stata. See Call Stata using API functions and Example 5 for more information.

[9]:
%%stata -d boston
describe

Contains data
 Observations:           506
    Variables:            14
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
CRIM            double  %10.0g
ZN              double  %10.0g
INDUS           double  %10.0g
CHAS            double  %10.0g
NOX             double  %10.0g
RM              double  %10.0g
AGE             double  %10.0g
DIS             double  %10.0g
RAD             double  %10.0g
TAX             double  %10.0g
PTRATIO         double  %10.0g
B               double  %10.0g
LSTAT           double  %10.0g
MEDV            double  %10.0g
-------------------------------------------------------------------------------
Sorted by:
     Note: Dataset has changed since last saved.

Then, we fit a linear regression on the median value of owner-occupied homes, MEDV.

[10]:
%%stata
regress MEDV CRIM RM AGE

      Source |       SS           df       MS      Number of obs   =       506
-------------+----------------------------------   F(3, 502)       =    216.13
       Model |  24076.2612         3   8025.4204   Prob > F        =    0.0000
    Residual |  18640.0342       502  37.1315422   R-squared       =    0.5636
-------------+----------------------------------   Adj R-squared   =    0.5610
       Total |  42716.2954       505  84.5867236   Root MSE        =    6.0936

------------------------------------------------------------------------------
        MEDV | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        CRIM |  -.2110231   .0340656    -6.19   0.000    -.2779518   -.1440944
          RM |   8.032838   .4020063    19.98   0.000     7.243016     8.82266
         AGE |  -.0522428   .0104628    -4.99   0.000     -.072799   -.0316866
       _cons |  -23.60556    2.76938    -8.52   0.000    -29.04656   -18.16456
------------------------------------------------------------------------------

You can also load multiple NumPy arrays or pandas DataFrames into Stata at once by specifying the -f argument. This will load each array or DataFrame into a separate frame in Stata. To demonstrate, we partition the boston DataFrame into two subsets, CHAS0 and CHAS1, based on the CHAS column.

[11]:
CHAS0 = boston[boston['CHAS']==0]
CHAS1 = boston[boston['CHAS']==1]

Then, we simultaneously load each DataFrame into a Stata frame with the same name.

[12]:
%%stata -f CHAS0,CHAS1
frames dir
* CHAS0    471 x 14
* CHAS1    35 x 14
* default  506 x 14

Note: Frames marked with * contain unsaved data.

Now we have the complete dataset and the two subsets all in memory.

The mata magic

The %%mata magic command is used to execute Mata code from within Python. For example, here we create the matrix X in Mata and then obtain its inverse, Xi. Then, we multiply Xi by the original matrix, X.

[13]:
%%mata
X = (76, 53, 48 \ 53, 88, 46 \ 48, 46, 63)
Xi = invsym(X)
Xi
Xi*X

. mata
------------------------------------------------- mata (type end to exit) -----
: X = (76, 53, 48 \ 53, 88, 46 \ 48, 46, 63)

: Xi = invsym(X)

: Xi
[symmetric]
                  1              2              3
    +----------------------------------------------+
  1 |   .0298458083                                |
  2 |  -.0098470272    .0216268926                 |
  3 |  -.0155497706   -.0082885675    .0337724301  |
    +----------------------------------------------+

: Xi*X
                  1              2              3
    +----------------------------------------------+
  1 |             1   -1.11022e-16   -1.11022e-16  |
  2 |  -1.11022e-16              1              0  |
  3 |             0              0              1  |
    +----------------------------------------------+

: end
-------------------------------------------------------------------------------

.

You can load one or more NumPy arrays into Mata as Mata matrices by specifying the -m argument; similarly, you can push one or more Mata matrices into Python as NumPy arrays by using the -outm argument.

Below, we generate a 3x3 NumPy array, npmat, and then calculate its inverse, storing it in npmat_inv.

[14]:
import numpy as np

np.random.seed(17)
npmat = np.random.random((3,3))
npmat_inv = np.linalg.inv(npmat)
[15]:
npmat, npmat_inv
[15]:
(array([[0.294665  , 0.53058676, 0.19152079],
        [0.06790036, 0.78698546, 0.65633352],
        [0.6375209 , 0.57560289, 0.03906292]]),
 array([[-11.67027493,   3.01012075,   6.64203063],
        [ 13.98144021,  -3.71879942,  -6.06620635],
        [-15.55729514,   5.67128247,   6.58662066]]))

Next, we load the NumPy array npmat into Mata as a matrix and calculate its inverse using Mata. Then, we push the inverse matrix to Python as a NumPy array named mata_inv.

[16]:
%%mata -m npmat -outm mata_inv
npmat
mata_inv = luinv(npmat)

. mata
------------------------------------------------- mata (type end to exit) -----
: npmat
                 1             2             3
    +-------------------------------------------+
  1 |  .2946650027   .5305867556   .1915207869  |
  2 |  .0679003582     .78698546   .6563335218  |
  3 |   .637520896   .5756028938   .0390629162  |
    +-------------------------------------------+

: mata_inv = luinv(npmat)

: end
-------------------------------------------------------------------------------

.

Below, we display the mata_inv array. As you can see, this array is the same as the one we calculated in Python, npmat_inv.

[17]:
mata_inv
[17]:
array([[-11.67027493,   3.01012075,   6.64203063],
       [ 13.98144021,  -3.71879942,  -6.06620635],
       [-15.55729514,   5.67128247,   6.58662066]])